18/03/2019
Updates
Register in GitHub Classroom
Timeline of Group Examination
- Examination handed out via GitHub (Classroom): Friday, 29 March 2019 (17:00)
- Deadline to hand in group examination results: Friday, 26 April 2019 (17:00)
Format of Group Examination
- GitHub classroom group assignment
- Basic starter code handed out as repository.
- A data analytics project based on a large data set, including the entire data pipeline.
- Tasks
- Instructions in README
- Improve efficiency of given code
- Extend code: complete specific tasks
- Explain/document procedure (conceptual understanding)
- 'Product': the repository, including R code, and a report in R markdown.
Schedule
- Introduction: Big Data, Data Economy (Concepts). M: Walkowiak (2016): Chapter 1
- Programming with Data, R Refresher Course (Concepts/Applied). M: Walkowiak (2016): Chapter 2
- Computation and Memory (Concepts)
- Cleaning and Transformation of Big Data (Applied). M: Walkowiak (2016): Chapter 3: p. 74‐118.
- Aggregation and Visualization (Applied: data tables, ggplot). M: Walkowiak (2016): Chapter 3: p. 118‐127. C: Wickham et al. (2015), Schwabish (2014).
- Data Storage, Databases Interaction with R. M: Walkowiak (2016): Chapter 5
- Distributed Systems, MapReduce/Hadoop with R (Concepts/Applied). M: Walkowiak (2016): Chapter 4.
How to fork the course repository
Recap Week 4
Beyond memory
- RAM is not sufficient to handle the amount of data to be analyzed…
- What to do?
- Scale up by using parts of the available Mass Storage (hard-disk) as virtual memory
Virtual memory
